Improving Population-specific Allele Frequency Estimates by Adapting Supplemental Data: an Empirical Bayes Approach.
نویسندگان
چکیده
Estimation of the allele frequency at genetic markers is a key ingredient in biological and biomedical research, such as studies of human genetic variation or of the genetic etiology of heritable traits. As genetic data becomes increasingly available, investigators face a dilemma: when should data from other studies and population subgroups be pooled with the primary data? Pooling additional samples will generally reduce the variance of the frequency estimates; however, used inappropriately, pooled estimates can be severely biased due to population stratification. Because of this potential bias, most investigators avoid pooling, even for samples with the same ethnic background and residing on the same continent. Here, we propose an empirical Bayes approach for estimating allele frequencies of single nucleotide polymorphisms. This procedure adaptively incorporates genotypes from related samples, so that more similar samples have a greater influence on the estimates. In every example we have considered, our estimator achieves a mean squared error (MSE) that is smaller than either pooling or not, and sometimes substantially improves over both extremes. The bias introduced is small, as is shown by a simulation study that is carefully matched to a real data example. Our method is particularly useful when small groups of individuals are genotyped at a large number of markers, a situation we are likely to encounter in a genome-wide association study.
منابع مشابه
Empirical Bayes procedure for estimating genetic distance between populations and effective population size.
We developed an empirical Bayes procedure to estimate genetic distances between populations using allele frequencies. This procedure makes it possible to describe the skewness of the genetic distance while taking full account of the uncertainty of the sample allele frequencies. Dirichlet priors of the allele frequencies are specified, and the posterior distributions of the various composite par...
متن کاملEmpirical Bayes Estimators with Uncertainty Measures for NEF-QVF Populations
The paper proposes empirical Bayes (EB) estimators for simultaneous estimation of means in the natural exponential family (NEF) with quadratic variance functions (QVF) models. Morris (1982, 1983a) characterized the NEF-QVF distributions which include among others the binomial, Poisson and normal distributions. In addition to the EB estimators, we provide approximations to the MSE’s of t...
متن کاملEMPIRICAL BAYES ANALYSIS OF TWO-FACTOR EXPERIMENTS UNDER INVERSE GAUSSIAN MODEL
A two-factor experiment with interaction between factors wherein observations follow an Inverse Gaussian model is considered. Analysis of the experiment is approached via an empirical Bayes procedure. The conjugate family of prior distributions is considered. Bayes and empirical Bayes estimators are derived. Application of the procedure is illustrated on a data set, which has previously been an...
متن کاملAn application of the empirical Bayes approach to directly adjusted rates: a note on suicide mapping in California.
A simple, reliable, and comparable measure for suicide mapping and other health problems is needed. Because standardized mortality ratios (SMRs) may not indicate the relative meaning of their magnitudes when compared with one another, and statistical significance levels of tests for SMRs overlook the areas that have small populations, neither of these approaches provides a satisfactory index. T...
متن کاملQuigley, John and Bedford, Tim and Walls, Lesley (2007) Estimating rate of occurrence of rare events with empirical Bayes : a railway application. Reliability Engineering and System
Classical approaches to estimating the rate of occurrence of events perform poorly when data are few. Maximum Likelihood Estimators result in overly optimistic point estimates of zero for situations where there have been no events. Alternative empirical based approaches have been proposed based on median estimators or noninformative prior distributions. While these alternatives offer an improve...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- The annals of applied statistics
دوره 1 2 شماره
صفحات -
تاریخ انتشار 2007